Open World Probabilistic Databases (Extended Abstract)

نویسندگان

  • Ismail Ilkan Ceylan
  • Adnan Darwiche
  • Guy Van den Broeck
چکیده

Introduction and Motivation Driven by the need to learn from vast amounts of text data, efforts throughout natural language processing, information extraction, databases. and AI are coming together to build large-scale knowledge bases. Academic systems such as NELL [14], Reverb [7], Yago [11], and DeepDive [16] continuously crawl the web to extract relational information. Industry projects such as Microsoft’s Probase [18] or Google’s Knowledge Vault [6] similarly learn structured data from text to improve search products. Notably, such knowledge bases are inherently probabilistic and many of them [6, 16] are based on the foundations of tuple-independent probabilistic databases (PDBs) [17]. According to the PDB semantics, each database tuple is an independent Bernoulli random variable, and all other tuples have probability zero, enforcing a closed-world assumption (CWA) [15]. This paper revisits the choice for the CWA in probabilistic knowledge bases. We observe that the CWA is violated in their deployment, which makes it problematic to reason, learn, or mine on top of these databases. First, knowledge bases are part of a larger machine learning loop that continuously updates beliefs about facts based on new textual evidence. From a Bayesian learning perspective [2], this loop can only be principled when learned facts have an a priori non-zero probability. Hence, the CWA does not accurately represent this mode of operation and puts it on weak footing. Second, these issues are not temporary: it will never be possible to complete probabilistic knowledge bases of even the most trivial relations, as the memory requirements quickly become excessive. This already manifests today: statistical classifiers output facts at a high rate, but only the most probable ones make it into the knowledge base, and the rest is truncated, losing much of the statistical information. Third, query answering under the CWA does not take into account the effect the open world can have on the query probability. This makes it impossible to distinguish queries whose probability should intuitively differ. These issues stand in the way of some principled approaches to knowledge base completion and mining. We propose an alternative semantics for probabilistic knowledge bases to address these problems, which results in open-world PDBs (OpenPDBs). We show that OpenPDBs provide more meaningful answers. Finally, we pinpoint limitations of OpenPDBs and discuss ontology based data access (OBDA) as promising approach to further strengthen this framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open-World Probabilistic Databases: An Abridged Report

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that fact...

متن کامل

Open-World Probabilistic Databases

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that fact...

متن کامل

Ontology-Mediated Queries for Probabilistic Databases (Extended Abstract)

The semantics of large-scale knowledge bases like NELL and Google’s Knowledge Vault is founded on (tuple-independent) probabilistic databases (PDBs) [3]. As for ordinary databases, they employ the closed-world assumption, i.e., missing facts are treated as being false (having the probability 0), which leads to unintuitive results when querying PDBs. Recently, open-world probabilistic databases ...

متن کامل

A Fuzzy Probabilistic Relational Database Model and Algebra

This paper describes an extended relational database model based on probability theory and possibility theory. Fuzzy information and probabilistic information are incorporated into the relational databases simultaneously to represent fuzzy probability of events in the real-world applications. The tuples in such a relation are associated with a possibility distribution, and their attribute value...

متن کامل

Ontology-Mediated Queries for Probabilistic Databases

Probabilistic databases (PDBs) are usually incomplete, e.g., contain only the facts that have been extracted from the Web with high confidence. However, missing facts are often treated as being false, which leads to unintuitive results when querying PDBs. Recently, open-world probabilistic databases (OpenPDBs) were proposed to address this issue by allowing probabilities of unknown facts to tak...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016